FINITE SAMPLE PENALIZATION IN ADAPTIVE DENSITY 
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Abstract. We consider the problem of estimating the density g of identically distributed vari- 
ables Xi, from a sample Zi, . . . , Z n where Zi = Xi + oEi, i = 1, . . . , n and aei is a noise inde- 
pendent of Xi with known density a -1 f e (./a). We generalize adaptive estimators, constructed 
by a model selection procedure, described in Comte et al. (2005). We study numerically their 
properties in various contexts and we test their robustness. Comparisons are made with respect 
to deconvolution kernel estimators, misspecification of errors, dependency,... It appears that 
our estimation algorithm, based on a fast procedure, performs very well in all contexts. 
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1. Introduction 

In this paper, we consider the problem of the nonparametric density deconvolution of g, the 
density of identically distributed variables Xi , from a sample Z\ , . . . , Z n in the model 

(1) Zi = Xi + aEi, i = l,...,n, 

where the Xi's and £j's are independent sequences, the e^s are i.i.d. centered random variables 
with common density f £ , that is a£i is a noise with known density a~ 1 f £ (./a) and known noise 
level a. 

Due to the independence between the X^s and the £j's, the problem is to estimate g using 
the observations Z\,--- ,Z n with common density fz(z) = a~ 1 g * f e {./a)(z). The function 
c /e(./ <r ) is often called the convolution kernel and is completely known here. 

Denoting by u* the Fourier transform of u, it is well known that since <?*(.) = f *(&.), 

two factors determine the estimation accuracy in the standard density deconvolution problem 
: the smoothness of the density to be estimated, and the one of the error density which are 
described by the rate of decay of their Fourier transforms. In this context, two classes of errors 
are usually considered: first the so called "ordinary smooth" errors with polynomial decay of 
their Fourier transform and second, the "super smooth" errors with Fourier transform having 
an exponential decay. 

1 Universite Paris V, MAP5, UMR CNRS 8145. 

3 IUT de Paris V et Universite d'Orsay, Laboratoire de Probabilites, Statistique et Modelisation, UMR 8628. 

1 



2 



F. COMTE, Y. ROZENHOLC, AND M.-L. TAUPIN 



For further references about density deconvolution see e.g. Carroll and Hall (1988), Devroye 
(1989), Fan (1991a, b), Liu and Taylor (1989), Masry (1991, 1993a, b), Stefansky (1990), Ste- 
fansky and Carroll (1990), Taylor and Zhang (1990), Zhang (1990) and Cator (2001), Pensky 
and Vidakovic (1999), Pensky (2002), Fan and Koo (2002), Butucea (2004), Butucea and Tsy- 
bakov (2004), Koo (1999). 

The aim of the present paper is to provide a complete simulation study of the deconvolution 
estimator constructed by a penalized contrast minimization on a model S m , a space of square 
integrable functions having a Fourier transform with compact support included into [— £ m ,£m] 
with t m = TiL m . Comte et al. (2005) show that for L m being a positive integer, this penalized 
contrast minimization selects the relevant projection space S m without any prior information on 
the unknown density g. In most cases, it is an adaptive estimator in the sense that it achieves 
the optimal rate of convergence in the minimax sense, studied by Fan (1991a), Butucea (2004) 
and Butucea and Tsybakov (2004). It is noteworthy that, contrary to what usually happens, 
£ m does not correspond here to the dimension of the projection space but to the length of the 
support of the Fourier transform of the functions of S m . Thus we will refer in the following to 
l m as the "length" of the model S m . 

Moreover, in the context of integer L m , Comte et al. (2005) provide a brief simulation which 
shows that the selected L m are rather small and therefore far from the asymptotic. Our present 
study shows that it is relevant to choose t m = irL m on a thinner grid than one included in ttN. 

Thus we start by stating a modification of the results in Comte et al. (2005) to take into 
account this thinner grid of values i m and we show that the resulting penalized minimum 
contrast estimator is an adaptive estimator in the sense that it achieves the optimal rate of 
convergence in the minimax sense. Here, the penalty depends on the smoothness of the errors 
density and therefore we consider two cases: Laplace density (ordinary smooth) and Gaussian 
density (super smooth). 

We illustrate, through examples, the influence of over-penalization and under-penalization 
and propose practical calibrations of the penalty in all considered cases. 

Then we study in very large simulations the non asymptotic properties of our estimator 
by considering various types of densities g, with various smoothness properties like Cauchy 
distribution, Gaussian density and finally Fejer-deda-Vallee Poussin-type density. 

We present some examples, that illustrate how the algorithm works. We give the mean 
integrated squared error (MISE) for the two types of errors density, for all the test densities, for 
various a, and for various sample size. Our results present global tables of MISE and comparisons 
between MISE and the theoretical expected rates of convergence. 

Lastly, the robustness of our procedure is tested in various ways: when the observations are 
dependent, when a is very small (leading to a problem of density estimation) and when the errors 
density f £ is misspecified or not taken into account. In those cases, we compare our procedure 
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with previous results of Delaigle and Gjibels (2004a, 2004b) and Dalelane (2004) (direct density 
estimation). 

The conclusions of our study are the following. Our estimation procedure provides very good 
results; better than the kernel deconvolution methods described and studied in Delaigle and 
Gijbels (2004a). Our estimation procedure is robust when the Z^s are no longer independent 
and even not strongly mixing. We underline the importance of the noise level in the quality of 
estimation, and we check that, in the case of a very small noise, we obtain MISE's that have 
the same order as some recent results obtained by Dalelane (2004) for direct density estimation. 
Lastly our results show that a misspecification of the errors density slightly increases the error 
of estimation, but less than the use of the direct density estimator (without deconvolving), as it 
was already mentioned in Hesse (1999). ^From a practical point of view it is important to note 
that our algorithm is a fast algorithm (0(nln(n)) operations) based on the Inverse Fast Fourier 
Transform (IFFT). 

The paper is organized as follows. In section 2, we present the model, the assumptions, the 
adaptive estimator and its expected rates of convergence. In Section 3, we describe the imple- 
mentation of the estimates (see I3.2JI and the computations of the associated integrated squared 
errors (|3.3|) . Section 4 presents the chosen penalties (see I4.2|) and describes the framework of 
our simulations. The simulation results are gathered in Section 5 and an appendix is devoted 
to the proof of our theorem. 



2. General framework and theoretical results 

2.1. Notations and assumptions. For u and v two square integrable functions, we denote 
by u* the Fourier transform of it, u*(x) = J e ltx u(t)dt and by u * v the convolution product, 
u * v(x) = f u(y)v(x — y)dy. Moreover, we denote by ||u|| 2 = J R \u{x)\ 2 dx. 
Consider Model Q under the following assumptions. 

(A x ) The Xj's and the e^'s are independent and identically distributed random 

variables and the sequences (Xi)^ and (ffi)jeN are independent. 
(A|) The density f E belongs to L2(M) and is such that for all f*(x) ^ 0. 



Under assumption (A~J"|, the Z^s are independent and identically distributed random variables. 



Assumption (A^), usual for the construction of an estimator in density deconvolution, ensures 
that g is identifiable. 

The rate of convergence for estimating g is strongly related to the rate of decrease of the 
Fourier transform of the errors density f E {x) as x goes to infinity. More precisely, the smoother 
f e , the quicker the rate of decay of /* and the slower the rate of convergence for estimating g. 
Indeed, if f E is very smooth, so is fz the density of the observations Z and thus it is difficult to 
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recover g. This decrease of f£ is described by the following assumption. 
(A|) There exist nonnegative real numbers 7, [i, and 5 such that 

\r e (x)\>K (x 2 + l)^/ 2 eM-^\ 5 } 



When 5 = in assumption ( |A|[ ), f £ is usually called "ordinary smooth", and when /i > and 
5 > 0, the error density is usually called "super smooth". Indeed densities satisfying assumption 
(A| ) with 5 > and fi > are infinitely differentiable. For instance, Gaussian or Cauchy 



distributions are super smooth of order 7 = 0, 5 = 2 and j = 0,6 = 1 respectively, and the 
symmetric exponential (also called Laplace) distribution with 5 = = fi and 7 = 2 is an 
ordinary smooth density. Furthermore, when <5 = 0, (A"f) requires that 7 > 1/2 in (A"|). By 



convention, we set fx = when 5 = and we assume that fi > when 5 > 0. In the same way, 
if a = 0, the Xj's are directly observed without noise and we set fi = 7 = 5 = 0. 

For the construction of the estimator we need the following more technical assumption. 

(A4") The density g belongs to IL.2(R) and there exists some positive real M2 

4 

This assumption (A^ I, quite unusual but unrestrictive, already appears in density deconvolution 

that 



x 



in a slightly different way in Pensky and Vidakovic (1999) who assume, instead of (A4 



su Px6M I x l5'( x ) < 00 • The main drawback of this condition is that it is not stable by translation, 
but an empirical centering of the data seems to avoid practical problems. 

Since rates of convergence depend on the smoothness of g we introduce regularity conditions. 

(R^) There exists some positive real numbers s, r, b such that the density 

g G <S s ,r,b(Ci) = It density : j + \t* (x)\ 2 (x 2 + l) s exp{2b\x\ r }dx < C^. 

(R2 ) There exists some positive real numbers K and d such that the density 

g G Sd{C2) = {t density such that for all x G R, |t*(a;)| < C2^[-d,d](x)} ■ 
Note that densities satisfying p^fj with r = belong to some Sobolev class of order s, whereas 



densities satisfying (R^ ) with r > 0, b > are infinitely differentiable. Moreover, such densities 
admit analytic continuation on a finite width strip when r = 1 and on the whole complex plane if 
r = 2. The densities satisfying (R^ ), often called entire functions, admit analytic continuation 
on the whole complex plane (see Ibragimov and Hasminskii (1983)). 

In order to clarify the notations, we denote by greek letters the parameters related to the 
known distribution of the noise e and by latin letters the parameters related to the unknown 
distribution g of X. 

Let us now present and motivate the estimator. 
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2.2.1. Projection spaces. Let <p(x) = sin(7T2;)/(7ra;) and (p m j(x) = V L m cp(L m x — j). Using that 
{<Pm,j}jez is an orthonormal basis of the space of square integrable functions having a Fourier 
transform with compact support included into [— 7rL m ,irL m ] = [—£ m ,£ m ] (see Meyer (1990)), 
we denote by S m such a space and consider the collection of linear spaces {S m ) me M n i with 
t m = mA, A > 0, and m G M n with M n = {1, . . . , m n }, as projection spaces. Consequently, 

S m = Vect{</? m j, j e Z}, = {/ G L 2 (K), with supp(/*) included into [-£ m ,£ m ]}, 

and the orthogonal projection of g on S m , g m is given by g m = YljezQ-mj^mj, with a m j =< 
'■Pmj^g >■ Since this orthogonal projection involves infinite sums, we consider in practice, the 
truncated spaces Sm ^ defined as 

sW=Vect{ip m j,\3\<Kn} 

where K n is an integer to be chosen later. Associated to those spaces we consider the orthogonal 
projection of g on S$ denoted by gfi) and given by g$ = J2\j\<K n a m,j¥> m ,j with a m>j =< 
Vm,j,g > • 

2.2.2. The non penalized estimators. Associate this collection of models to the following contrast 
function, for t belonging to some S m of the collection {S m )L m eM n 



2 n 1ft* 

7n(*) = INI 2 - - J]«t*(^i), with ««(») = ^ {jfi^} 



-x). 



Since E[it£(Zj)] = (t,g), we find that E(7 n (i)) = \\t — g\\ 2 — \\g\\ 2 which is minimum when 
t = g. Since j n (t) estimates the L 2 distance between t and g, it is well adapted for estimating 

(n) 

g. Associated to the collection of models, the collection of the non penalized estimators g m is 
defined by 

(2) g$ = arg min 7n (t). 

t&S m 

By using that t h-> ut is linear, and that Wm,j}\j\<K n is an orthonormal basis of Sm\ we have 

9m = T l \j\<K n d -m,j'Pm,j where Omj = n _1 EC=l u V>mj(^)> with E («mj) =< S.Vmj > = 

2.2.3. XTie adaptive estimator. The adaptive estimator is computed by using the following pe- 
nalized criteria 

(3) g = g^ with m = arg min 7n(<?m } ) + pen(£ m ) , 

where pen(.) is a penalty function based on the observations and the known distribution of oe\ 
without any prior information on g. 
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2.3. Rate of convergence of the non adaptive estimator. We recall here, using our setup, 
the bound for the risk of g m , proved in Comte et al. (2005). 

L,. 



&\ 2 )<\\9 



9m\\ + 



(4) E(\\g- 

First, the variance term 

^ f \^(x)\\r £ (aL m x) 
irn J 



>)||2 + 
Urn II 



TTTl 



if*(x) 



7rn 



f*(aL m x) 



dx 



dx. 



-1 |/*(a4»z)r 

depends, as usual in deconvolution problems, on the rate of decay of the Fourier transform of f e , 
with larger variance for smoother f e . Under assumption ( A| |, for £ m > the variance term 
satisfies 



dx 



vrn \f*(a£ m x)\' 



< A!^ +1 - 5 exp(2 M (^ m ) 5 )/n, 



where 



(5) 



Ai 



Ct-2 + /-2>)7 

^ j and a, 5) 



1 

2fi5a s 
[ 2^a s 



if 5 = 
if < 5 < 1 
if 5 > 1. 



Second, under assumption ( 



flYn — 9m || 2 is of order {M% + l)^m/ i^ 2 K n ). Consequently, 
under (T£|]), ET n > (M 2 + 1) n ensures that the risk E([|<7 — <7m^|| 2 ) has the order 

|| g - g m || 2 +(2Ai + 1)C' +1 ~ 5 ^ exp {2^^} /n. 

Finally, the bias term \\g — g m || 2 depends on the smoothness of the function g and has the 
expected order for classical smoothness classes since it is given by the distance between g and 
the classes of entire functions having Fourier transform compactly supported on [— £ m ,£ m ] (see 
Ibragimov and Hasminskii (1983)). 

= d. It follows that in 



If g satisfies (R^l, then the bias term || g — g n 



0, by choosing 



that case the parametric rate of convergence for estimating g is achieved. 

If g belongs to some <S Sl r,&(Ci) defined by (Rf ), then the squared bias term can be evaluated 
by using that 

„o_J_ 

~ 2vr 



W - 9r, 



\x\>i„ 



\g*(x)\ 2 dx<^-(el + l) 



•exp{-2&0- 



Consequently, under ( Af ), if K n > {Mi + l)n, the rate of convergence of gffl is obtained by 

(n) 

selecting the space Sm , and thus £ m , that minimizes 

+ ir-Pt-a« + pa, + ^-^i^} . 

2tt n 
One can see that if £ m becomes too large, the risk explodes, due to the presence of the second 
term. Hence £ m appears to be the cut between the relevant low frequencies used in the Fourier 
transforms to compute the estimate and the high frequencies which are not used (and may even 
degrade the quality of the risk) . 
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We give the resulting rates in Tabled For a density g satisfying (R? I, rates are, in most 
cases, known to be the optimal one in the minimax sense (see Fan (1991a), Butucea (2004), 
Butucea and Tsybakov (2004)). We refer to Comte et al. (2005) for further discussion about 
optimality. 





fe 


5 = 
ordinary smooth 


super smooth 


9 


r = 
Sobolev(s) 


^ = 0(n 1 /(2-+27+i)) 
rate = 0(n- 2s /( 2s +^ +1 )) 
optimal rate 


£^ = Mn)/(2na s + 1)]V<5 
rate = 0((ln(n))- 2s / s ) 
optimal rate 


r > 


^ = [ln(n)/26]^ 

rate = O v ' 

optimal rate 


£m implicit solution of 
lrn 2sW1 - r exp{2 / u f x 5 4 + 2b£L] 
= 0(n) 
optimal rate if r < 5 



Table 1. Optimal choice of the length (£m) and resulting (optimal) rates. 



In the case 5 > 0, r > 0, the rates are not explicitly given in a general setting. For instance, if 
r = 5, the rate is of order 

(6) [ln(n)] b n- b ^ b+ ^ with b = [-2sfia 5 + (2 7 - r + l)b}/[r(fia 5 + b)}. 
On the other hand, if r/8 < 1/2, then the rate is given by 

(7) ln(n)- 2s / 5 exp 

Remark 2.1. First, it is important to note that the condition K n > (M2 + l)n allows us to 
construct truncated spaces Sm ^ using 0(n) basis vectors and hence to construct a tractable and 
fast algorithm from a practical point of view (see Section 3). Second, the choice of larger K n 
does not change the efficiency of our estimator from a statistical point of view but only changes 
the speed of the algorithm from a practical point of view. 

2.4. Rate of convergence of the adaptive estimator. The following theorem is an extension 
of Theorems 4.1 and 4.2 in Comte et al. (2005). This new version states that, for any fixed A, 
we can take £ m = raA, with m = 1, ■ ■ ■ , m n , instead of £ m = mix. 
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Theorem 2.1. Consider the model described in section HO under jAfp , |A|| ) and | ) 

and the collection of estimators gffl defined by with £ m = mA for m = 1, • • • , m n . Zei Ai 
and A2 &e too constants depending on j,KQ,fj,,5 and a. Let k be some numerical constant, not 
necessary the same in each case. Consider 

1) pen(C) > kX^ 1 - 5 e W {2fi{al m ) 5 }/n, if < 5 < 1/3, 

2) pen(£ m ) > k[Xx + f ia 1 / 3 TT 1 / 3 X 2 ]£ 2 ^ +2/3 exp^a 1 ^/ 3 } / n> if 6 = 1/3, 

3) pen(£ m ) > k[X\ + fiir 8 X 2 }£^ +i{1/2+5/2)M) exp{2fi(a£ m ) 5 } /n, if 5 > 1/3, 

then, if K n > (M2 + l)n andm n is such that pen(£ mn ) is bounded, the estimator g = g^ defined 
by {3Jj satisfies 

(8) E(\\g - g\\ 2 ) < C inf [|| 5 - g m \\ 2 + pen(C)] + 

e m t={l,...,m n } An 

where C and c are constants depending on f £ . 

In the first two cases, the lower bound of the penalty has the same order as the variance term 

and the risk of the adaptive estimator g has the order of the smallest risk among the estimators 

An) 

associated to the collection of g m ■ Hence we get an adaptive to the smoothness of g statistical 
procedure, that can choose the optimal £ m in a purely data driven way, up to the knowledge of 
M 2 through the choice of K n > (M 2 + l)n. 

In the last case, a small loss of order £^^ 2 1 / 2 ) A<5 may occur. Nevertheless, this loss does 
not affect the rate of convergence if the bias is the dominating term, that is when 5 > 1/3, and 
< r < 5. This loss changes the rate only when the variance is the dominating term, that 
is when 1/3 <C 5 <J r and consequently when the considered £ m are powers of ln(n)). When 
1/3 < 5 < r, the rate is faster than logarithmic, and only a logarithm loss occurs, as a price to 
pay for adaptation. This loss occurs in particular when both the density g to be estimated and 
the density of the errors f £ are gaussian. 

The interest of taking £ m = mA lies in the possibility of choosing the best £ m among more 
values. Nevertheless, the theorem highlights that too small A's make the remainder term c/ (nA) 
become larger. For instance, according to Tabled when g satisfies (R^ ), we can choose A = 
l/ln(n) and, when v < 2, since 7 > 1/2 (in order to guarantee that f £ belongs to L2(M)), we 
do not lose anything in term of rate of convergence. Clearly if g is an entire function satisfying 



(R2" I, A has to be fixed. Since we do not know in which smoothness class the true density is, 



the only strategy ensuring that the good rate is achieved is to take a fixed A. 



3. Estimates and associated MISE implementation 

3.1. Steps of the simulations. Given a density g, a distribution of error e, a sample size n, 
a value of a, we sample the Z^s and do the following steps: 
— compute the estimators via, their coefficients (fi m *•). 
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— compute the contrast using that 



~( d (n)\__ i a .12 _ _|| S (n)||2 

\i\<K n 



minimize ^ n {9m) + pen(^ m ) and deduce the selected m and the associated g = 
evaluate the estimation error by a computation of the integrated squared error (ISE), \\g — g\\ 
repeat all the previous steps 1000 times and compute an empirical version of MISE, E||g — g 



[2 



3.2. Computation of the estimators. We fixed arbitrarily A = 1/10. Given the data 
Zi, . . . ,Z n , we need to compute for several values of i m = A, 2 A, . . . , the coefficients of the 
estimate g$ , g$ = J2\j\<K n °"rn,jVm,j, <Pm,j = \JT^^{L m x - j) with ip(x) = sin(7ra;)/(7ra;). 
Since 

k=l k=l J Jsy 1 



we get that by denoting ipz(x) = n 1 J2k=i e lxZk , the empirical Fourier transform of fz(-) = 
0" _1 £T* fe(- /<?)), then 

1 1 f" !TLm P ix ( z k-j/L m ) nr~ r 1 „. . qi, 7 (f r ) 

a mj = iy^^ / e dx = ^ e' 2 ^ x : Z } ™ , dx. 

To compute integrals of type 2 _1 e 2t7Z ^ x u(x)dx, we use their approximations via Riemann 
sums: 

1 ^-v 1 ij = jL+2kn -l + 2k 

(9) ^E e ' N ^^v - } - 

k=0 

Note that the IFFT (Inverse Fast Fourier Transform) Matlab function is defined as the function 
which associates to a vector (X(l), . . . ,X(N))' a vector (Y(l), . . . ,Y(N))' such that, for N = 
2 M , 

N N-l 

(10) Y(j) = 1 £ = ^/Z X ( k + l)^- 1 ^- 

k=l k=0 

Hence, for X(k) = (ip z /f*((r.))(2(k - l)£ m /N) for Jfe = 1, . . . , N and for Y = (Y u . . . ,Y N )' = 
IFFT(X), we get a m j = Yj + \^ t m /ir for j = 0, . . . , N — 1 = 2 M — 1. The quantity to be chosen 
is M such that K n = 2 M - 1 > (M 2 + l)n. Indeed the be computed by using this 

IFFT with K n = N = 2 M — 1 and with adequate shifts. In that way, he quantity \\g m — gm\\ 2 
is always negligible with respect to the others. 

One should take M > log 2 (n + 1)- After checking that a choice of a larger values (up to 11) 
does not change the estimation quality, we finally choose M = 8. 
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3.3. Computation of the integrated squared error (ISE), \\g — g\\ 2 . We have two different 
ways for computing the integrated squared error \\g — g\\ 2 . 

(El) Standard approximation and discretization of the integral on an interval of R as it is 
done in Delaigle and Gijbels (2004a) and Dalelane (2004). In order to compare our 
results to theirs, we proceed to this valuation on the same intervals. 

Since this evaluation on finite interval may lead to an under- valuation of the ISE, we also propose 
an exact calculation of the ISE on R as described in the following. 

(E2) Evaluation of the ISE on the whole real line. We use the decomposition 

I|aW_ ||2 _ II ||2 , || _ W||2 , || (n)_ 5 (n)n2 

WiJin all — l|y i/m|| > \\ym iJm II ' Ham i)m II • 

In the cases we consider, g* is available and the bias term is computed by using the 
standard formula \\g - g m \\ 2 = (l/(2n) J, x , >em \g*(x)\ 2 dx. We bound \\g m - gl% \\ 2 by a 
term of order l 2 jK n < l 2 m /2 M . Finally, the variance term \\g m — g$\\ 2 , is calculated 
using that 

|| (")-o II 2 - V la -a -I 2 
\j\<K n 

Consequently, we need the computation of a m j = v / £^/(2y / 7r) f^ 1 e~ 2m: > x g*(£ m x)dx, 
coefficients of the development of the projection gffl = J2\j\<K n a rn,j^Pm,j on Sm . Again, 
using IFFT (see @ and ftty). with G = (G u ■ ■ ■ , G N ) and G k = g*(2(k - l)ir/N) for 
k = 1, . . . , N, we get G* = (Gf, . . . , G* N )' = IFFT(G). Then a m>j = y/tJ^G)^ for 
j = 0, . . . , N — 1 = 2 M — 1. This second method requires the knowledge of g* and is 
unavoidable for stable distributions for which the analytical form of g is not available. 

Remark 3.1. Speed of the algorithm: Since the IFFT is a fast algorithm, the computation 
of our estimates is also a fast algorithm and requires only 0(2 M ln(2 M )) = 0(n ln(n)) operations 
if K n = 2 M — 1 is of order n. 

4. The practical framework 

4.1. Description of the test densities g. We consider several types of densities g, and for 
each density, we give the interval I on which the ISE is computed by the method (El), which 
is the case in all examples except for stable distributions, where the use of method (E2) is 
unavoidable. The set of test densities can be split in three subsets. First we consider densities 
having classical smoothness properties like Holderian smoothness with polynomial decay of their 
Fourier transform. Second we consider densities having stronger smoothness properties, with 
exponential decay of the Fourier transform. And finally we consider densities with Fourier 
transform compactly supported, that is satisfying Condition (R^ ). 



Except in the case of densities leading to infinite variance, we consider density functions g 
normalized with unit variance so that 1/<t 2 represents the usual signal-to- noise ratio (variance 
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of the signal divided by the variance of the noise) and is denoted in the sequel by s2n defined 
as s2n = 1/cr 2 . 

(a) Uniform distribution: g{x) = l/(2y / 3)lj_ v ^ ^(x), g*(x) = sm(xV3)/(xV3), I = [-5,5]. 

(b) Exponential distribution: g(x) = e~ x l^+(x), g*(x) = 1/(1 — ix), I = [—5, 10]. 

(c) ^{ty-type distribution: X = \/^fQU, gx(x) = \^6g(\^6x), U ~ x 2 (3) where we know 



that U ~ r(|, 1), 



9u(x) = -—^-—e-^M 112 ,^) 



2 5 /2r(3/2) 11 ' auK ' (l-2ix) 3 / 2 ' 

and I = [-1,16]. 

(d) Laplace distribution: as given in (|11|) . I = [—5,5]. 

(e) Gamma distribution: T(2,3/2), with density g(x) = (3/2) 2 xexp(— 3x/2)1jr+(x), g*(x) = 
— 9/(4x 2 + 12ix — 9). This density has variance 8/9, and is renormalized for simulation, 
/ = [-5, 25]. 

(f) Mixed Gamma distribution: X = with W ~ 0.4r(5, 1) + 0.6r(13, 1), 

r s r . xAe ~ x , s * , s 0.4 0.6 
g w {x) = [0.4 * — — + 0.6— — — ] 1 R+ (x) , g* w (x) = — ? + 



r(5) r(i3) J M v J,ywK ' {i-ixf (i-ix) 13 ' 

and I = [-1.5,26]. 

(g, h, i) Stable distributions of index r = 1/4 (g), r = 1/2 (h), r = 3/4 (i). In those cases, the 
explicit form of g is not available but we use that |<7*(x)| = exp(— |x| r ). The ISE is 
computed with method (E2). 
(j) Cauchy distribution: g(x) = (l/vr)(l/(l + x 2 )), g*{x) = e~\ x \ I = [-10, 10]. 
(k) Gaussian distribution: X ~ A/"(0, cr 2 ) with a = 1, I = [—4,4]. 
(1) Mixed Gaussian distribution: X ~ ^/2V with V ~ 0.57V(-3, 1) + 0.5AT(2, 1) 

= 0.5^(e-( a:+3 ) 2 / 2 + e -(- 2 ) 2 /2) }) = o.5(e- 3 » + e 2lx )e~ x2 l\ 

V 27T 



and 7 = [-8,7]. 

,n, o, p) Scale transforms of the Fejer-de la Vallee-F 'oussin distribution: 

l-cos(px) 
S(:r) = 2 ' 9 ( x ) = (! - kl/PJ+, 

for p = 1 in (m), p = 5 in (n), p = 10 in (o) and p = 13 in (p) and I = [—10, 10]. 

Densities (a,b,c,d,e,f) correspond to cases with r = (Sobolev smoothness properties) with 
different values of s, whereas densities (g,h,i,j,k,l) correspond to cases with r > (infinitely 
times differentiate) with different values for the power r. Clearly, (a,b) are not even continuous. 

Since the stable distributions (g,h,i) as well as the Cauchy distribution (j), have infinite 
variance, s2n = l/o~ 2 is not properly defined. 

The stable distributions (g,h,i) also allow to study the robustness of the estimation procedure 
when assumption ( A4" I is not fulfilled. When the density to be estimated g is of type (g,h,i) 
the tails of g(x) are known to behave like |x| - ( r+1 ) (see Devroye (1986)). It follows that, for such 
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densities, assumption (A? I is fulfilled only if r > 1/2. Consequently only the stable distribution 
(i), satisfies (A 



The case of distributions (m,n,o,p) deserves some special comments: they correspond to 
densities whose Fourier transform has compact support included in [—1, 1] for (m), [—5,5] for 
(n), [—10,10] for (o) and [—13,13] for (p). As a consequence, the bias term JL^^ \g*(x)\ 2 dx 
equals zero as soon as i m > 1 for (m), t m > 5 for (n), for t m > 10 for (o), i m > 13 for (p). 
Therefore, the asymptotic rate for estimating this type of density is the parametric rate. 

All above listed densities are plotted in Figure H Note that for the stable distributions, since 
no explicit form is available, we give in fact the plot of the projection of the distribution on the 



1071") as computed by the projection algorithm. 



space (for 

We refer to Devroye (1986) for simulation algorithms of stable and Fejer-de la Vallee-Poussin 
distributions. 



4.2. Two settings for the errors and the associated penalties. We consider two types 
of error density f s , the first one is the Laplace distribution which is ordinary smooth (5 = in 
( |A|| )), and the second one is the Gaussian distribution which is super smooth (5 > in ( |A|[ )). 

The penalty is connected to the variance order. In both settings, we will precise this variance 
order and the value of the integral appearing in it. Since the theory only gives the order of 
the penalty, by simulation experiments, we fixed the constant n and precise some additional 
negligible (with respect to the theory) terms used to improve the practical results. In both cases 
we give the penalty given in Comte et al. (2005) with A = ir in £ m = Am and the new penalty 
allowing to use a thinner grid for the ^ m 's: here we take A = 1/10. 



• Case 1: Double exponential (or Laplace) e's. 

In this case, the density of e is given by 



(11) 



fe(x) 



-V2\x\ 



2, r £ (x) = (l + x 2 /2)- 1 . 



This density corresponds to centered e's with variance 1, and satisfying ( A| I with 7 = 2 
Kq = 1/2 and fi = 5 = 0. 

The variance order is evaluated as 



«(€ m /(27rn)) J ^ l/\f:(a£ m x)\ 2 dx = «(* m /(™)) [ 1 + ■ > 



a 



2ff2 



+ 



(7 



20 



Let us recall that, in Comte et al. (2005), A = 7r, k = 6ir and the penalty is the following 

^ 3 oHt 



(12) 



pen( 



6 

n 



+ 7rln A5 (C/vr) + 



a 



+ 



20 



The additional term (ln(^ m /7r)) 2,5 is motivated by the works of Birge and Rozenholc (2002) 
and Comte and Rozenholc (2004). This term improves the quality of the results by making the 
penalty slightly heavier when l m becomes smaller. 
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Here, using intensive simulations study we propose the following penalty: 
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(13) 

with 
(14) 
• 

(15) 



2.5 / 1 
pen(^ m ) = — 1 - — 
n \ sin 



^ + 81n 2 - 5 (C(U) + 2^+3(l + 4 , , n 

3 \ sin I 10 



2 oH 



C(4 



"l2<f m <4 + 



•m^m>4- 



4(vr - 2) 

Case 2: Gaussian e's. In that case, the errors density f e is given by 

1 



fe(x) 



This density satisfies ( |A|[ ) with 7 = 0, «o = 1, S = 2 and \i = 1/2. 

According to Theorem 12.11 the penalty is slightly heavier than the variance term, that is of 
order 

Ki gS/2-l/2)AS {im/i2nn)) J 1 1/(^(^)12^ = K £^/2-l/2)A5 (c/(27rn)) £ ^tf^fo. 
Comte et al. (2005), for A = 7r, choose k = 67r and their penalty is the following 



(16) 



pen(4 



6 

n 



+ vrln^(C/vr) + 



e2 a 2 



exp[(a£ m x) 2 ]dx. 



According to the theory, the loss, due to the adaptation is the term a 2 £ 2 n /3. As previously, 
the additional term ln(£ m /7r) 2,5 is motivated by simulations and the works of Birge and Rozen- 
holc (2002) and Comte and Rozenholc (2004). 

Using intensive simulation study we propose the following penalty 



(17) 



pen(4, 



2.5 

n 



1 



1 

sin 



£ m + 81n 2 - 5 (C(C)) + 



er 



'.£3 



ex.p[(a£ m x) 2 ]dx, 



where C(^m) is defined by (|14|1 and the integral is numerically computed. 

Remark 4.1. Note that when a = 0, both penalties are equal to (2.5/n)(£ rn + 81n(£(£ m )) 2 ' 5 ). 

Remark 4.2. Since A = 1/10 we choose new constants and add a factor depending on s2n in 
(|13l) and 1)17(1 with respect to (|12|) and (|16|) . The function C(-^m) is only chosen to give a smoother 
version of l m V 7r. The comparison of the penalty (|12() for integer L m 's, the new penalty with 
C(4n) = V (not smoothed) and our final choice in H13j) is given in Figure 21 for a 2 = and 
for a 2 = 0.1. The difference between the two £ functions clearly vanishes when u 2 increases. 

Remark 4.3. The influence of over- or under-penalization is illustrated in Figure |3J where 
three penalties are tested for the estimation of the mixed gaussian distribution. The figure plots 
the selected ^ m 's related to the ISE for 100 simulated path of the distribution. This shows that 
over-penalization leads to smaller selected £ m J s with increased ISE's, whereas under-penalization 
leads to greater selected ^ m 's with a more important increase of both the dimensions and the 
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ISE's. The central cloud of diamonds gives the selected ^ m 's for our penalization and shows that 
for this distribution our penalty is very well calibrated. 

As illustrated by Figure El usually under penalization leads to larger values of t m and increases 
the variance which degrades the MISE more than over penalization. Hence it is better to prevent 
from under penalization, the penalty is therefore increased. Here, since i m takes values on a 
thin grid, preventing against under penalization is less important and one can choose a smaller 
penalty which leads to a better trade-off between bias and variance. This leads to a better 
control of the risk. 

Remark 4.4. It is noteworthy that the penalty functions (|13f) and (|17l) depend on s2n which 
is unknown. In Section [5.41 we propose a study of the robustness of the algorithm when s2n = 
Var(X) /a 2 = Var(Z)/a 2 — 1 is replaced by a simple estimator (empirical variance of the observed 
ZiS instead of the theoretical one). 

4.3. Theoretical rates in our examples. In order to compare the MISE resulting from our 
simulations, we give in the Table[2]the expected theoretical (and asymptotic) rates corresponding 
to each cases we study. 

It is noteworthy that even if theoretical results are established for densities satisfying Con- 



dition {Hf- |, since we are in a simulation study, we consider the explicit form of the Fourier 



transform of g to evaluate the bias. Consequently, for the calculation of the expected theoretical 
rates given in Tabled we denote by s,r and b, the constants such that 

(18) \\ g -g m f<-L f | 5 *(x)| 2 ^<^(4 + l)- s ex P {-26C}- 

2tt J\ x \>e m 2ir 

Then, we evaluate the theoretical rate of convergence by using the results in Tabled with those 
s, r and b. 

Let us briefly comment this table El Let us mention that with those choices of test densities, 
we describe all types of behavior of the rates. According to Theorem 12.11 except in the case 
where f £ is the Gaussian density and the density to be estimated is also the Gaussian density 
(0 < 5 < 1/3 or r < 5), the expected rates of convergence of the adaptive estimator g is the 
expected rate of convergence of the non penalized estimator g^ with asymptotically optimal 
rate, that is the rate given in Tabled with the convention (|18j) about s, r and b. 

In the remainder case, when f e is the Gaussian density and the density g is also the Gaussian 
density, r = 5 = 2 > 1/3, the penalty is larger, of a logarithmic factor, than the variance of the 
non penalized estimator g^. Since the penalty is the dominating term in the trade-off with the 
bias, the rate of convergence of g is slower than the rate of convergence of the corresponding 
non penalized estimator g^. Let us be more precise. When g is Gaussian, we have a bias term 
given by 

" + °° exp(-0 



'\x\>e„ 



p+co r+oo 

\g* (x)\ 2 dx = 2 / exp(— x 2 )dx < 2 \ exp(— i m x)dx < 
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and a variance term of order £ m _1 exp(2/i(<x£ m ) 2 ). So that, according to the convention (|18|). 
we apply Formula © with s = 1/2, b = 1/2, r = 2, 5 = 2 and = 1/2, to get that the rate of 
convergence of the non penalized estimator g^ is of order 

ln(n)~2n < T ' 2 + 1 . 

Now, according to Theorem 12. 1| the penalty is of order l m ex.p(2fj l (a£ m ) 2 ). We obtain that the 
rate of convergence of the adaptive estimator g is of order 

(ln(ra)) 2 " a +in <^+i . 

This implies a negligible loss of order ln(n) 1 ^ 1+cr ) for not knowing the smoothness of g. 

Remark 4.5. Let us mention that taking a = in columns 2 and 3 in TableHJdoes not always 
provide the theoretical rates in the last column, with a = 0. Some of the results above are not 
continuous when a —* 0, especially when we consider Gaussian errors. This comes partly from 
the constants depending on a that could completely change when a becomes small, and from 
the bound 

[ im ( 2 2x , ^ f lm ( 2 x , exp(g 2 4J - 1 
/ exp(cr x )dx < / exp(cr l m x)dx = j- . 

Jo JO a *-rn 

The last term is globally equivalent to £ m when a tends to zero. But only the first part 
exp(cr 2 ^ 2 n )/(o" 2 £ 2 n ) is retained for a > to evaluate the rate of convergence. In a general 
setting, the dominant term for the variance term changes when a gets smaller. 

5. Simulation results 

5.1. Some examples. Figures 31 and El illustrate the performances of the algorithm and the 
quality of the estimation for ordinary and super smooth functions g. Not surprisingly, the 
uniform distribution or the stable 1/2 distribution are not very well estimated, whereas the 
quality of the estimation for the four other functions is very good. 

Let us start a brief comparison with the results in Comte et al. (2005). It is noteworthy that 
for the mixed gaussian density for instance, the length selected by the algorithm with A = 1/10, 
corresponds to a L m which is much smaller than 1 since £ m = irL m . Moreover, the other choices 
illustrate that the algorithm takes full advantage of the more numerous possible choices that 
can be done for the £ m 's. Besides, the selected lengthes are always quite small and thus far from 
asymptotic. 

5.2. Mean Integrated Squared Errors. For all simulations, the MISE is evaluated by em- 
pirical estimation over 1000 samples. Table |21 presents the MISE for the two types of errors, the 
different tested densities, different s2n and different sample sizes. 

The first comment on Table 01 concerns the importance of a. Clearly the MISE are smaller 
when there is less noise (a small, s2n large). 

The second comment is about the relative bad results for the estimation of stable distributions, 
especially for stable distribution with parameter 1/4. If we have a look at the theoretical rate 
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of order (ln(ra)) 20 /n, we easily see that this rate tends to zero but the asymptotic is very far 
compared with the considered sample sizes as it is illustrated in Section 15.31 Also note that, 
in those cases, the computation of the MISE is done by using the method (E2), which leads to 
larger MISE than those computed with (El) (two or three times (or more) larger MISE with 
(E2) than with (El)), as illustrated by the comparisons in Section [5.81 
Table 01 specifies that we take M = 8. 

5.3. Comparison of empirical and theoretical rates. The rates can be illustrated from 
Table 01 by plotting the MISE obtained in function of n. This allows to compare the empirical 
and the theoretical asymptotic rates and to evaluate the influence of the value of a 2 . It is worth 
emphasizing anyway that in the case where the error is Gaussian and g super-smooth (densities 
(g,l)), the rate is directly function of a 2 . Moreover, the rate is clearly better than logarithmic. 

In order to compare the empirical MISE with the theoretical MISE, we plot in all cases for 
all values of n and of s2n, the log-MISE in function of ln(n). In order to allow the comparison 
with the theoretical rates, these log-rates are plotted with dashed lines abacuses in function of 
ln(n). Each abacus corresponds to a different value of the (unknown) multiplicative constant in 
the rate. The results are plotted in Figures [7| (Laplace errors) and El (Gaussian errors). 

Consider for instance the case of Mixed Gamma distribution with Laplace errors in Figure 
sixth subplot. The dashed abacuses give the log of n -9 / 14 (theoretical rate, see Table EJ) up to 
an additive constant. The full lines give the empirical rates for s2n = 2 to s2n = 1000 from top 
to bottom. As — (9/14) ln(100) ~ —3, one can deduce from the plot that, since the intercept is 
between -5.5 and -6, the constant is between e -2 ' 5 and e -3 and the rate of order 0.08ra~ 9 / 14 for 
s2n = 2 and 0.05n~ 9 / 14 for s2n = 1000. 

We can see that most results are in very good accordance with the theoretical predictions, 
but a few results in the case of Laplace errors are less satisfactory. Figure El explains the reason 
of this last fact: when we plot the theoretical log-rates in function of n in those cases, we find 
out that the asymptotic that make the logarithmic part of the rate negligible is reached for only 
very huge values of the sample size n. It is quite positive anyway to see that in those bad cases, 
our method behaves much better than what could be hoped from the asymptotics. Figure 
plots these curves including some higher values of n going up to n = 25000, to show how further 
are the asymptotics in practice. 

Note that, for the rates depending on a, we arbitrarily chose s2n = 4 since it was not possible 
to give several theoretical curves. On the one hand, it appears from the Cauchy distribution 



that even if assumption ( Af ) is not satisfied, the procedure can work. On the other hand, 



stable distributions show nevertheless that a narrow pick can be quite difficult to estimate. 

5.4. Robustness when s2n is estimated. We now propose a study of the robustness of the 
algorithm when s2n = Var (X)/a 2 = Var(Z)/cr 2 — 1 is replaced by a simple estimator (empirical 
variance of the observed Z^s instead of the theoretical one). The MISE is computed with the 
algorithm built on a penalty with an estimated s2n with a lower bound 1/0.6 that is about 
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1.67. This lower bound is required for s2n = 2 mainly. As we already mention it, an under- 
penalization can make the MISE explode and must be avoided. We compute the ratio of the 
MISE obtained with the estimated s2n over the MISE of Table El when s2n is known, and we 
obtain ratios equal to one, except in the cases given in Table |1J which remain of order one for 
most of them. The empirical s2n in the penalty has therefore very small influence. 

5.5. Comparison with some dependent samples. 

5.5.1. Two (3-mixing examples. In Comte et al. (2005), most of the asymptotic properties of the 
adaptive estimator g are stated in the i.i.d. case, but some robustness results are also provided. 
More precisely, it is shown that, when both the X^s and the e^s are absolutely regular, under 
some weak condition on the /3-mixing coefficients, then the L2-risk of the adaptive estimator g has 
the same order as in the independent case. The main change is the multiplicative constant in the 
penalty term, which involves the sum of the /3-mixing coefficients. In other words, the adaptive 
procedure remains relevant for dependent data. Here we propose to study the performances of 
the computed estimator when the X^s are now /3-mixing, and so are the Zi's. 

This study is done by comparing the MISE obtained respectively for the Gaussian (k) and 
the mixed Gaussian (1) distributions in the independent case with the distributions obtained in 
the dependent cases generated as follows. 

• Construction of the dependent sequence of the Xj's with stationary standard Gaussian 
distribution (k). 

Let (r/k)k>o be a sequence of i.i.d. Gaussian random variables with mean and variance a 2 . 
Let (Yfc)o<fe<n+looo be a sequence recursively generated by 

(19) Y k+1 = aY k + b + r] k+1 , Y = 0,0 <a<l. 

In that case, the distribution of the sequence of the Yj-'s converges with exponential rate to 
a unique stationary distribution which is the Gaussian distribution N{b/(1 — a),cr 2 /(l — a 2 )). 
Therefore, we take, as an n-sample of X, the sequence (X\, • • • , X n ) = (Yiooi, ■ • • , Yn+iooo), and 
we choose 6 = 0, and a 2 = 1 — a 2 , in ()19|) . so that the resulting distribution of the Xj's is the 
standard Gaussian jV(0, 1). Consequently, the stationary distribution of the Xj's distribution is 
the standard Gaussian density (k). 



• Construction of the dependent sequence of the Xj's with stationary mixed Gaussian distri- 
bution (1). 

We propose here to mix two such gaussian sequences, independent from each other. More 
precisely, we generate two sequences, using the method described previously. We first generate 
yW, k = 1, • • • ,n + 1000 with a 2 = 1 - a 2 , b = -3(1 - a) and second Y^ 2 \ k = 1, • • • , n + 1000 
with <7 2 = 1 — a 2 , b = 2(1 — a). Finally we generate some uniform variable on [0, 1], denoted 
by U and propose to take X k as X k = 5v7 10 oo ^ U < 0.5 and X k = l^+ 10 oo e ^ se- Clearly, the 
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covariance between the Xj and Xj+i is divided by two thanks to the independent additional uni- 
form sequence standardly used for the mixing of the distributions. It follows that the stationary 
distribution of the Xj's is the mixed Gaussian distribution (1). 

In both contexts, we generate such sequence of Xj's for different values of a, < a < 1. Such 
sequences are known to be geometrically /3-mixing, with /3-mixing coefficients {j3k)k>o such that 
Pk < Me"*, for some constants M and 9. The nearer a of 1, the stronger the dependency. 

We study the properties of g, for different values of a, by computing the ratio between the 
resulting MISE and the MISE obtained in the independent cases (k,l). The results are presented 
in Table El and Table El 

We can see that the procedure behaves in the same way in both cases, and that the resulting 
MISE ratios comparing the dependency to independence get higher when a increases and gets 
nearer of one. The result remain quite good until a = 0.8 and even 0.9 for small s2n's, if we 
keep in mind that the MISE is very low in the independent case for these two distributions. 

Globally, for reasonable values of a (at least between and 0.75), the dependency does not 
seem to bring any additional problem. 

5.5.2. A dependent but non mixing example. We also simulate the following dependent model. 
Generate (?7i)i<i<n+iooo an i.i.d. Bernoulli sequence (771 = or 1 with probability 1/2). Then 
generate Ui+i = (1/2)17$ + with Uq = 0, for i = 1, . . . , n + 1000. Take Xk = \/3(£7fc+iooo — 1) 
for k = l,...,n. The stationary distribution of the Uf-'s is a uniform density on [0,2] and 
therefore the distribution of the Xj's is the distribution (a), uniform on [— v3, \/3]. This model 
is however known to be dependent and non mixing (see e.g. Bradley (1986)). We experiment 
the estimation procedure and we compute the ratio of the MISE for this model with the MISE 
in the independent case (a), for the different values of s2n and sample sizes. The resulting table 
is not given here because it contains essentially ones, the non ones number being at most 1.1. 
This may be due to the poor quality of our estimation of the uniform distribution even in the 
independent context which is then not worse in this special dependent context. But this shows 
also that the procedure may be robust to some form of dependency quite different of the one 
usually met in the statistical literature. 

5.6. Comparison with Delaigle and Gijbels' (2004a). We propose here to compare the 
performances of our adaptive estimator with the performances of the deconvolution kernel as 
presented in Delaigle and Gijbels (2004a). This comparison is done for densities (e,f,k,l) which 
correspond to the densities #2, #6, #1 and #3 respectively, in Delaigle and Gijbels (2004a). 
They give median ISE obtained with kernel estimators by using four different methods of band- 
width selection. The comparison is given in Table [7| between the median ISE computed for 1000 
samples generated with the same length and signal to noise ratio as Delaigle and Gijbels (2004a). 
We compute the MISE's with direct approximation of the integrals on the same intervals as they 
do, see Section 14.11 We also give our corresponding means since we think that they are more 



FINITE SAMPLE PENALIZATION IN ADAPTIVE DENSITY DECONVOLUTION 



19 



meaningful than medians. With a multiplicative constant in the penalty smaller than the one 
we chose, it may happen that medians are much better but means become huge simply because 
of a few number of bad paths. The cost of such bad paths seems therefore to have a price given 
by means and completely hidden by medians. 

We can see that our estimation procedure provides results of the same quality for the ordinary 
smooth densities, namely for the x 2 (3) and the Mixed Gamma densities, but that our results are 
globally quite better for super-smooth densities (namely, the Gaussian and the mixed Gaussian 
densities). It is noteworthy that in this case the new penalty functions given in and (|17|) 
give better MISE than the penalty functions (|12|) and (|16|) provided in Comte et al. (2005). 

5.7. Comparison with direct density estimation when s2n is large. We propose now to 
study the robustness of our procedure when s2n is large, that is when the X^s are in fact almost 
observed. We propose to compare the non asymptotic properties of our deconvolution estimator 
when s2n = 10000, with those, presented in a recent work by Dalelane (2004), about adaptive 
data driven kernel estimator for density estimation, (based on the sample (Xi, ■ • • ,X n )). We 
consider here three of the four densities considered by Dalelane (2004), namely the normal 
density (k), the scale transform of the Fejer-de la Vallee Poussin density, the Fejer 5 distribution 
given by (n) and the T(2,3/2) distribution (d). The results are given in Table |H1 We give the 
MISE for Laplace errors since the MISE for Gaussian errors are essentially the same when 
s2n = 10000. 

Even in these circumstances which are very unfavorable to our estimator, we find out that 
our method performs very well for the Gaussian distribution (even often better than Dale- 
lane's (2004) estimator), quite well for the Gamma density where the MISE's are of the same 
order, and also for the Fejer 5 for n = 500 or n = 1000. Only the results for the Fejer 5 
distribution when n is small (n = 50, 100) give much higher MISE's. 

Therefore, it appears that our density deconvolution estimator performs quite well despite the 
great number of additional numerical approximations as compared to Dalelane's (2004) results. 

5.8. Comparison of methods (El) and (E2): evaluation of the MISE on ]R versus 
on an interval. Here, we want to compare the two methods of computation of the MISE on 
an interval and on M as described in section H£2l for a set of densities for which both methods 
are possible: exponential, x 2 (3) 5 Laplace, Cauchy. In those cases, we can evaluate the bias as 
follows: 




with 

* for g an exponential distribution (b), J] 



■ m 



g*(x)\ 2 dx = 2Arctan (l/£ m ) ■ 
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for g a normalized X 2 (3) (c), 



L 



\g*(x)\ 2 dx = VE 1 



-m ^ 



1 2 



* 



for <7 a normalized Laplace density (d), 



|<7* (x)| 2 cix 




* for ^ a Cauchy distribution (j), JL>^ m b*(^)| 2 ^ = e~ 2im . 

This allows to apply method (E2) to compute the "true" MISE on the whole real line. 

It appears from Table El that the computation of the MISE's with method (E2) gives results 
which are about two or three times greater than with method (El), except in the case of the 
exponential law where some numerical problems seem to occur when s2n becomes greater and 
for the X2(3) distribution where small samples or high levels of noise seem to induce ratios of 
order 10. In the other cases, the ratio decreases when s2n gets greater. The difference between 
the two methods of evaluation comes of course from the oscillations of the estimate over the 
whole real line, even when the true function tends to zero. 

5.9. Results when the errors density is misspecified. We propose here to study the non 
asymptotic properties of the estimator when the error density is not correctly specified. For 
both type of errors, we study the behavior of the estimator using one type of the error density 
to choose the penalty when the other type of errors density is used for the simulations of the 
Zi's. Table ITU1 presents the ratio between the resulting MISE if the errors density is not correct 
with the MISE if the errors density is correct. For instance, in the first column, the errors are 
Laplace but the estimator is constructed as if the error density were Gaussian. Some theoretical 
results on the effect of misspecifying the errors distribution can be found in Meister (2004). 

Some comments follow. As expected, since the construction uses the knowledge of the error 
density, if it is misspecified, the estimator presents some bias and the MISE becomes slightly 
larger. Nevertheless, this difference does not clearly appear when n is not very large. In- 
deed in that case, the optimal length £ m is small and therefore the variance term of order 
Jo™ \f* i crx )\~ 2 dx is not so quite different between the two errors. In order to underline our 
comments we present in Figure ITUl the Fourier transform of the two error densities, the Laplace 
and the Gaussian density. Here, a is known. Globally, if we hesitate between Laplace and 
Gaussian errors, Table fTUl seems to indicate that until n = 1000, it is a good strategy to always 
choose Gaussian errors for the estimation procedure. 

We also study the behavior of our algorithm when ignoring the noise, that is by using our 
algorithm with a = when a is not null. This amounts to consider that the X^s are observed 
(Zi = Xj) when it is not the case. In order to do this comparison, we simulate noisy data 
(s2n = 2, 4, 10) and run the estimation procedure as if a = by putting s2n = 10000 in the 
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associated penalty. Table HT1 presents the ratios between MISE resulting from the procedure used 
with s2n = 10000 and MISE resulting from the normal procedure which uses the knowledge of 
a and then sin. 

Surprisingly, one can remark two different behaviors of the ratios on Table ITT1 No deterioration 
and even improvements for small values of n. This can be explained by the fact that the penalty 
is smaller when a = so the algorithm can choose larger t m which may be of interest for certain 
densities when n is small. For larger values of n, we clearly see an improvement to use our 
deconvolution algorithm against a direct density estimation ignoring the noise. 



6. Concluding remarks 

As a conclusion, let us emphasize that we provide a complete simulation study involving all 
types of possible theoretical behaviors and rates, which are very various in the context of density 
deconvolution, depending on the type of the errors and of the distribution to be estimated. The 
results are obtained with a fast algorithm using in particular the well-known good performances 
of IFFT, and are globally very satisfactory, as compared with some other results given in the 
literature. The method is very stable and reliable, even when some conditions set by the theory 
are violated (as in the case of stable distributions), and is robust to dependency in the variables. 
The standard way of computing the ISE on an interval is nevertheless proved to be more favorable 
than a more global method that can be implemented here. Nevertheless the first method is the 
standard one. The procedure seems also robust to a misspecification of the error density provided 
that the level of the noise is well calibrated, and is numerically stable enough to recover good 
orders as compared to direct density estimation in spite of much more (and useless in a case of 
direct estimation) computations. Therefore, our global results show that the procedure works 
very well, even for finite sample leading to selected lengthes very far from the asymptotic orders. 



Appendix : proof of Theorem 12. II 

The proof essentially follows the lines of the proof of Theorem 4.1 and 4.2 in Comte et 
al. (2005), and details the role of A. We define u n (t) = ± EHiK^) ~ (*> 9>] and 5 m y(0, 1) = 
{t G V £, / ||i|| = 1}. Arguing as in Comte et al. (2005), for x > 1 we have 



112 / / X + 1 

\g-g\\ < T 

x — 1 



Is - S. 



{n)]]2 + x{x + 1) ^ l/ 2 (t) + ^±l (pen( ^ ) _ pen( ^ )) _ 



x - 1 



tes m , A (o,i) 



x- 1 



Choose some positive function p{l m -,^m') such that xp(£ m , l m i) < pen(£ m ) + pen(^ m /). Conse- 
quently, for k x = (x + l)/(x — 1) we have 



(20) \\g-g\\ 2 <K 2 x 



I l|2 , II (n)\\\2 

IS-Sm|| +||Sm-Sm)ll 



+ XK x W n (£ 

in I 

) + pen(£ m ) - pen(£m)) 
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with W n {l m >) : = [sup igBmm , (0i i) Wn(t)\ 2 - p(£ m ,£ ml )] + , and hence 

(21) ||ff ~ gf < 4\\9 ~ 9m\\ 2 + 4 (M2 „ + + 2K xV en{£ m ) + xk x V W n {£ m ,). 

The main point of the proof lies in studying W n (£ m r), and more precisely in finding p(£ m ,< 
such that for a constant K, 

(22) ®(W n (£ m ,))<K/(nA). 

In that case, combining (|21j) and (|22|1 we infer that, for all m in 7W n , 



(23) E\\g-g\\ 2 <C x inf 



|5-5m|| +pen(0 + 



(M 2 + 1)£ 



if 

-f- XK X 7" j 



7T 2 if n 

where C x = k 2 V 2^ suits. It remains thus to find p(£ m ,£m') such that (|22j) holds. This is done 
by applying a version of Talagrand's Inequality (see Talagrand (1996)), to the class of functions 
T = S m m /(0, 1). If we denote by £ m * = £ m V £ m /, we get that 

where I(£ m * ) and II(£ m * ) are defined by 

, tf 27+(l/2-(5/2)A(l-S) ro 

HM = ^ n exp{2Ma c * } eM-^e(M/^)&- 5/2 H 

2 7 +l-5 2na s (l m *) s , . 

H{Zm*) = 2 exp{-(^C(Ov^/v^} , 



A 



2 



with for £ m > £q, 

' 1 if <5 > 1 

A l/2 ( ^2 + CT 2 r /2|| /£l | K -l (27rrl /2 if( j< L 

1) Study o¥J2 meM JI(£ m *). 
If we denote by T(£ m ) = £% +1 ~ 5 exp{2/xa <5 ^} then 

n ( £ ™*) < C(Xi)\M n \ exp [-{K^C{0MI^} r(4J/« 2 

Consequently, as soon as T(£ mn )/n is bounded (we only consider m n such that pen(£ m „) is 
bounded), then J2 m <=M n n (^m*) <C/n 

2) Study of £ meMn /(-M- 

Denote by V = 27 + (1/2 - 5/2) A (1 - 6), u = (1/2 - S/2) + , K' = K 1 X 1 /X 2 , then for a, b > 1, 
we infer that 



(a V &)^e 2 ^ CT (aV6) e~^' 5 (aVb) " < (a^e 2 ^ a + 5V> e W& 4 ) e -(^72)K+^) 

(24) < a *#^a' e -(K t ?/2)a?' e -(K t p/2)V* + b i> ^ b s & -(K'e /2)b- 
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Consequently, if we denote by f the quantity f (£ m *) = ^m* + ^ 2 5 ^ 2 ^ A ^ ^ exp{2//a <5 ^*} then 
£ W) < C 1 (A 2 )^exp{-(^ 2 /2)^ /2 - 5/2) } £ ex P {-(^ 2 /2)e /2 ^ /2) } 

m'&Mn ra'&Mn 

r (^m') r /i^/ c 2s/)(l/2-«5/2)- 



(25) +d(A 2 ) £ ^exp{-(re 2 )C 2 - d/2) }. 



n 

m'eMn 



a) Case < 5 < 1/3. In that case, since 5 < (1/2 — 6/2)+, the choice £ 2 = 1 ensures that 
f (4 l )exp{-(if / £ 2 /2)(4 l )( 1 / 2 - 5 / 2 )} is bounded and thus the first term in (J25J) is bounded by 

C r °° 



nl 



poo 

/ exp{-(K'f)xW 2 - s ^}dx<C/{nA). 
Jo 



In the same way, J2 m 'eM n r (^»') ex P{-(-^'C 2 )^ 2 8,2) }/n is bounded by 

/>oo 

/ (s + i)27+(i/2-5/2)a(i-5) ew {2na s ((x + I)) 5 } exp{-(K'f)x^ 2 - 5 ^}dx < C/(nA). 
Jo 

It follows that Ylm'eM ^i^m*) — C/(nA). Consequently, (|22j) holds if we choose pen(£ m ) = 
2x(l + 2e 2 )A^ 2 7 +1 - 5 exp{2fia s £ s m }/n. 

b) Case 6 = 1/3. In that case, bearing in mind Inequality (|24|) we choose £ 2 such that 2/j,a s £^ n * — 
(K'£ 2 /2)£ s m * = -2fj,a s £ s m * that is £ 2 = (^ficr 5 \ 2 )/(K 1 \ 1 ). By the same arguments as for the 
case < 6 < 1/3, this choice ensures that Ylm'eM I(&m*) < C/(nA), and consequently ((2*2*]) 
holds. The result follows by taking p(£ m ,t m >) = 2(1 + 2£ 2 )\ 1 £ 2 2+ 1 ~ S exp(2fia 5 £ 5 m ,)/n, and 
pen(C) = 2x(l + 2C 2 )Aid 7+1_<5 exp(2/ia 5 4)/n. 

c) Case 6 > 1/3. In that case, 6 > (1/2 — 6/2)+. Bearing in mind Inequality (|24jl we choose 
£ 2 = ^(£ m ,£ m ,) such that 2//<r 5 ^. - {K'g/2)£ w m * = -2fia s £ s m , that is 

e = S 2 (£m,£m>) = (WX 2 )/(K 1 X 1 )£ 5 m ^. 

This choice ensures that Y^m'eM A^m*) — C/{nA), and consequently l"*"2*l) holds and © 
follows iip(£ m ,£ m/ ) = 2(1 + 2C 2 ^ m ,V))Ai4 7 * +1 " 5 exp(2^ C j' 5 0/«, and pen(£ m ) = 2x(l + 
2e(i m ,i m ))Xii 2 nI +1 ~ S exp(2 f ia s £ s m )/n. □ 
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Table 2. Theoretical orders of the rates of the adaptive estimator as deduced 
from Tableland formulae © and (J7J) when a > and (last column) when a = 0. 
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Table 3. Empirical MISE obtained with 1000 samples and 
formed with M = 8, for different sample size (n = 100, 250, 
different values of s2n (2, 4, 10, 100, 1000, the higher s2n 
level) . 
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Figure 7. Empirical MISE an 1 d theoretic! asymptotill rates kAgarithmie 1 ° 
scale, when the errors follow a Laplace distribution. From top to botton, full 
lines correspond to increasing s2n (2,4,10,100,1000). Dashed lines are abacuses 
(up to an additive constant) for the log-theoretical rates. 
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Figure 8. Empirical MISE and theoretical asymptotical rates in logarithmic 
scale, when the errors follow a Gaussian distribution. From top to botton, full 
lines correspond to increasing s2n (2,4,10,100,1000). Dashed lines are abacuses 
(up to an additive constant) for the log-theoretical rates. 
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Figure^). Empirical MISE and 2 fhe 1 oreticaf rates in logarithmic sc&fk up to 
25000 when the errors follow a Laplace distribution. The "bad cases" . 
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'able 6. Ratio of MISE obtained with 1000 samples and different va' 
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2.8 


Proj.: median 
Proj.: mean 


0.48 
0.56 


0.62 
0.67 


0.23 
0.27 


0.26 
0.30 



Table 7. Lower and higher Median ISE obtained by Delaigle and Gijbels (2004) 
with four different strategies of bandwidth selection in kernel estimation com- 
pared with median and mean for our penalized projection estimator. 
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xl(T 2 


method 


n = 50 


n = 100 


n = 500 


n = 1000 


g Gaussian 


D. Kernel 


1.18 


0.63 


0.13 


0.08 


Gauss. Ker. 


1.72 


1.27 


0.28 


0.16 


sine Ker. 


2.16 


1.14 


0.26 


0.10 


Proj. 


0.84 


0.53 


0.18 


0.12 


g Fejer 5 


D. Kernel 


2.29 


0.79 


0.22 


0.13 


Gauss. Ker. 


3.07 


1.84 


0.55 


0.22 


sine Ker. 


3.92 


1.87 


0.55 


0.23 


Proj. 


6.74 


3.93 


0.32 


0.27 


g Gamma(2,3/2) 


D. Kernel 


2.70 


1.48 


0.52 


0.27 


Gauss. Ker. 


2.77 


2.09 


0.61 


0.31 


sine Ker. 


6.17 


4.03 


1.66 


0.37 


Proj. 


3.13 


2.19 


0.96 


0.65 



Table 8. MISE for our projection estimator (Proj.) with Laplace penalty using 
s2n = 10000 and for direct density estimation by kernel of Dalelane(2004), with 
Gaussian kernel (D. Kernel) or with sin(x)/x kernel (sine). 





n = 


= 100 


n = 


: 250 


n = 


= 500 


n = 


1000 


n = 


2500 


9 


sin 


Lap. 


Gaus. 


Lap. 


Gaus. 


Lap. 


Gaus. 


Lap. 


Gaus. 


Lap. 


Gaus. 


Exp. 


2 
4 
10 
100 
1000 


2.7 

3 

3.4 
3.7 
3.8 


2.3 
2.5 

3 

3.7 
3.7 


3.4 
3.8 
4.5 

5 
5 


2.7 
3.1 
3.8 
4.9 

5 


3.9 
4.5 
5.4 
6.6 
6.7 


3.1 
3.6 
4.6 
6.2 
6.7 


4.6 
5.3 
6.7 
9.8 
11 


3.4 
4.1 
5.3 
8.6 
10 


5.6 
6.5 
8.6 
15 
19 


3.8 
4.7 
6.4 
12 
18 


Laplace 


2 
4 
10 
100 
1000 


1.4 
1.3 
1.3 
1.3 
1.3 


1.3 
1.2 
1.2 
1.3 
1.3 


1.3 
1.3 
1.2 
1.2 
1.2 


1.3 
1.2 
1.2 
1.2 
1.2 


1.3 
1.2 
1.3 
1.3 
1.2 


1.3 
1.2 
1.2 
1.2 
1.2 


1.3 
1.3 
1.3 
1.4 
1.4 


1.3 
1.2 
1.2 
1.3 
1.4 


1.3 
1.3 
1.4 
1.7 
1.8 


1.3 
1.2 
1.2 
1.5 
1.7 


Chi2(3) 


2 
4 
10 
100 
1000 


12 
12 
10 
9.6 
9.6 


15 
15 
12 
9.9 
9.6 


11 
9.6 
8.1 
7.4 
7.4 


13 
12 
9.8 
7.7 
7.3 


9.2 
8.1 
6.5 
5.6 
5.6 


12 
11 
8.2 
5.9 
5.6 


7.8 
6.6 
5.1 
3.9 
3.9 


11 
9.2 
6.7 
4.3 
3.9 


6.1 
5.1 
3.4 
1.9 
1.7 


9.9 
7.7 
5.2 
2.3 
1.8 


Cauchy 


2 
4 
10 
100 
1000 


4.6 
4.6 

4 

3.5 
3.5 


6.1 
6.5 
5.8 
3.7 
3.5 


4.2 

4 

3.5 
3.3 
3.3 


5.3 
5.6 
4.5 
3.4 
3.3 


3.8 
3.6 
3.1 
2.8 
2.8 


4.9 
4.9 
3.9 
2.9 
2.9 


3.3 
3.1 
2.5 
2.3 
2.3 


4.6 
4.4 
3.4 
2.5 
2.4 


2.7 
2.5 
2.2 

2 
2 


4.3 
3.8 
2.7 
2.1 
2 



Table 9. Ratio of the MISE obtained by method (E2) over the MISE obtained 
with method (El). 
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n = 


1 nn 
1UU 


n = 


ZOU 


n = 


ouu 


n = 


1 nnn 
1UUU 


n = 


/OUU 


Noise 


Lap. 


Gaus. 


Lap. 


Gaus. 


Lap. 


Gaus. 


Lap. 


Gaus. 


Lap. 


Gaus. 


Penalty 


Gaus. 


Lap. 


Gaus. 


Lap. 


Gaus. 


Lap. 


Gaus. 


Lap. 


Gaus. 


Lap. 


9 


s2n 






















0) 


2 


0.93 


1.1 


0.92 


1.2 


1.1 


1.3 


1 


1.5 


1.6 


2 


% 


4 


0.97 


1 


0.96 


1 


0.96 


1.1 


0.98 


1.2 


1.1 


1.5 


j 


10 


0.99 


0.99 


0.99 


0.99 


1 


1 


0.99 


1 


1 


1.2 


s 


2 


0.98 


1.1 


0.93 


1 


0.91 


1.1 


1 


1.2 


1.2 


1.5 


q 


4 


0.99 


1 


1 


1 


0.98 


1 


0.99 


1.1 


1 


1.2 


2 


10 


1 


1 


0.98 


1 


0.98 


1.1 


0.98 


1 


1 


1 


>.. 

J3 


2 


1.1 


0.98 


1 


0.93 


1.1 


0.91 


1.2 


1 


1.5 


1.2 


O 


4 


1 


0.99 


1 


1 


1 


0.98 


1.1 


0.99 


1.2 


1 


O 


10 


1 


1 


1 


0.98 


1.1 


0.98 


1 


0.98 


1 


1 


m 


2 


0.95 


1 


0.93 


1.2 


0.88 


1.2 


1.2 


1.1 


1.5 


1.1 


3 


4 


0.96 


1 


0.96 


1 


1 


1 


0.95 


1 


1.1 


1 


o 


10 


1 


0.97 


1 


0.96 


1 


1 


0.97 


1.1 


0.99 


1 




2 


0.91 


0.98 


1.1 


0.97 


1.1 


1 


1.2 


1.1 


1.1 


1.1 


o 

Ml? 


4 


0.99 


1 


1 


1 


0.95 


0.99 


1 


0.96 


1 


1 


En 


10 


1 


0.97 


0.93 


1 


0.96 


0.96 


1.1 


1 


1 


0.98 



Table 10. Ratio between MISE with misspecified error density (Laplace errors, 
g estimated as if errors were Gaussian and reciprocally) and MISE with correctly 
specified error density. 





n = 


= 100 


n = 


= 250 


n = 


= 500 


n = 


1000 


n = 


2500 


9 


sin 


Lap. 


Gaus. 


Lap. 


Gaus. 


Lap. 


Gaus. 


Lap. 


Gaus. 


Lap. 


Gaus. 


o 

CJ 


2 


1 


0.9 


1.3 


1.2 


1.5 


1.4 


1.9 


1.8 


2.9 


2.2 


3 

"3, 

03 


4 


0.95 


0.68 


1 


0.87 


1.2 


1 


1.5 


1.3 


2.3 


1.9 




10 


0.96 


0.78 


0.98 


0.79 


1 


0.83 


1.1 


0.99 


1.6 


1.4 


a 


2 


0.9 


0.89 


0.92 


0.78 


0.99 


0.81 


1.2 


1.1 


1.8 


1.6 


o 


4 


0.93 


0.94 


0.9 


0.62 


0.95 


0.67 


1.1 


0.91 


1.5 


1.3 


3 


10 


0.96 


0.89 


0.92 


0.65 


0.96 


0.77 


1 


0.92 


1.2 


1 


>> 

J3 


2 


0.83 


0.7 


0.99 


0.88 


1.2 


1.1 


1.5 


1.6 


2.2 


2.3 


o 

3 


4 


0.81 


0.5 


0.89 


0.71 


0.99 


0.93 


1.2 


1.2 


1.7 


1.9 


u 


10 


0.87 


0.6 


0.89 


0.82 


0.91 


0.84 


0.92 


0.99 


1 


1.4 


(D 


2 


1.2 


1.2 


1.6 


2 


1.8 


2.4 


2.1 


3.1 


2.8 


4.4 




4 


1.1 


0.95 


1.1 


1.6 


1.2 


1.8 


1.4 


2 


1.9 


3 


O 


10 


0.94 


1.1 


0.87 


1.1 


0.87 


1.1 


0.87 


1.1 


1 


1.6 




2 


0.97 


0.92 


1.1 


1.6 


1.3 


1.5 


1.5 


1.9 


2 


2.6 


0> 

; c? 


4 


0.96 


0.82 


0.97 


1.4 


0.99 


1.3 


1.1 


1.5 


1.4 


2 


Ed 


10 


0.9 


1.2 


0.86 


0.99 


0.83 


0.98 


0.82 


1.1 


0.89 


1.2 



Table 11. Ratio between MISE when ignoring noise and MISE with correctly 



specified error density. 



